Addressing Big Data with Hadoop
ثبت نشده
چکیده
Nowadays, a large volume of data from various resources such as social media networks, sensory devices and other information serving devices are produced. This large collection of unstructured, semi structured data is called big data. The conventional databases and data ware houses can’t process this data. So we need new data processing tools. Hadoop addresses this need. Hadoop is an open source platform that provides distributed computing of big data. Hadoop composed of two components. A storage model called hadoop distributed file system and computing model called MapReduce. Map reducer, is a programming model for handling large complex task by doing two steps called map and reduce. In map stage the master node partition the problem into sub problems and distribute the task into worker nodes. The worker nodes pass the result to master node after solving the problem. In the reduce phase the master node reduce the answers of the sub problem to a final solution.
منابع مشابه
Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملBig data preprocessing: methods and prospects
The massive growth in the scale of data has been observed in recent years being a key factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety of data that require a new high-performance processing. Addressing big data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis...
متن کاملThe rise of "big data" on cloud computing: Review and open research issues
Cloud computing is a powerful technology to perform massive-scale and complex computing. It eliminates the need to maintain expensive computing hardware, dedicated space, and software. Massive growth in the scale of data or big data generated through cloud computing has been observed. Addressing big data is a challenging and timedata processing and analysis. The rise of big data in cloud comput...
متن کاملPerformance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014